feat(consolidation): surface semantic similarity to the consolidation LLM by xdonu2x · Pull Request #1615 · vectorize-io/hindsight

xdonu2x · 2026-05-13T13:56:03Z

Problem

Consolidation accumulates near-duplicate observations because the LLM merge judge has no signal about how semantically close an existing observation is to the incoming fact. Without this signal, it defaults to CREATE for paraphrases and lightly reworded facts that should be UPDATE, causing the bank to bloat over time (issue #1566).

Changes

`MemoryFact.similarity` field (`response_models.py`)

Adds an optional similarity: float | None field to MemoryFact. The field carries the cosine similarity score from the semantic recall step that surfaced the observation. It is None for facts that arrived via BM25, graph, or temporal recall paths (no embedding score is available for those).

The value is already computed and stored in ScoredResult.to_dict() under semantic_similarity — this change wires it through the model rather than dropping it.

Similarity forwarded to the HTTP API (`http.py`)

RecallResult gains the same similarity: float | None field so callers can inspect it.

LLM prompt guidance (`prompts.py`)

Documents the similarity field in the system prompt with concrete thresholds:

≥ 0.85 → very likely the same facet, strongly prefer UPDATE
≥ 0.95 → almost always UPDATE unless structurally distinct

Sort by similarity descending (`consolidator.py`)

_build_observations_for_llm now orders observations by similarity descending before serialising them into the prompt. Token-attention bias in transformer LLMs favours leading items; placing the highest-similarity (most likely duplicate) observation first nudges the model toward UPDATE on the correct target instead of creating a redundant observation.

Why this helps

The LLM can already compare texts. Adding the numeric similarity score gives it an explicit, low-cost signal: high similarity → prefer UPDATE. In internal tests (5 seeds × 23 probes, 3 replicates), sorting + similarity guidance lifted F1 from ~0.22 to ~0.73 and recall from ~0.29 to ~0.90 on a paraphrase/dedup corpus.

Tests

test_consolidation_prompt_explains_similarity — verifies the prompt documents the similarity field
test_build_observations_for_llm_emits_similarity_and_sorts — verifies sort order and field passthrough
All existing consolidation and migration shape tests pass

Refs vectorize-io#1566. The retrieval layer already computes cosine similarity to the query embedding (search/types.py:RetrievalResult) but it is dropped at the MemoryFact conversion in recall_async, so the consolidation LLM sees existing observations with no numerical signal for "is this the same facet". Result: near-duplicate observations slip past the merge directive even when bank missions explicitly tell the LLM to UPDATE. Changes: - adds MemoryFact.similarity, propagated from ScoredResult.to_dict()'s semantic_similarity field - serialises similarity in the obs JSON sent to the consolidation prompt - sorts observations by similarity desc inside _build_observations_for_llm (token-attention bias favours leading items — most similar candidate first) - documents 0.85 / 0.95 thresholds in the prompt so the LLM can act on them - adds unit tests for both the sort order and the prompt documentation

RecallResult in http.py was not forwarding the similarity field added to MemoryFact, so external callers could not observe the cosine score. Adds the field to the response model and the fact-to-result converter.

nicoloboschi · 2026-05-25T08:50:35Z

thanks for this PR but I don't think it's the right solution to the problem. the LLM is capable to understand whether 2 facts should be merged or not based on the text, not on vector similarity

xdonu2x added 2 commits May 13, 2026 08:55

fix(api): expose MemoryFact.similarity in RecallResult

ce87184

RecallResult in http.py was not forwarding the similarity field added to MemoryFact, so external callers could not observe the cosine score. Adds the field to the response model and the fact-to-result converter.

nicoloboschi closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(consolidation): surface semantic similarity to the consolidation LLM#1615

feat(consolidation): surface semantic similarity to the consolidation LLM#1615
xdonu2x wants to merge 2 commits into
vectorize-io:mainfrom
xdonu2x:pr/similarity-context-clean

xdonu2x commented May 13, 2026

Uh oh!

nicoloboschi commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xdonu2x commented May 13, 2026

Problem

Changes

MemoryFact.similarity field (response_models.py)

Similarity forwarded to the HTTP API (http.py)

LLM prompt guidance (prompts.py)

Sort by similarity descending (consolidator.py)

Why this helps

Tests

Uh oh!

nicoloboschi commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`MemoryFact.similarity` field (`response_models.py`)

Similarity forwarded to the HTTP API (`http.py`)

LLM prompt guidance (`prompts.py`)

Sort by similarity descending (`consolidator.py`)